Our world is rapidly urbanizing in practically all geographies. In 2018, the UN reported that 55% of the global population lived in urban areas and projected that number to rise to 68% by 2050. With this urbanization, access to green space and parkland is certain to decrease across geographies and across populations. With this general decline in access to nature, there is likely to be an increase in the equity of access, with the wealthy and privileged losing access at a lesser rate than the poor and underprivileged. There has been work done to assess the public health impacts of this inequity in the access to nature in the context of COIVD (Spotswood et al., 2021), as well as work done on the “luxury effect,” which shows that wealthy neighborhoods can support higher biodiversity in some cases due largely to the increased presence of park land (Magel et al., 2021). In this analysis, I seek to understanding the relationship between access to green space and exposure to potentially harmful pollutants. I will also examine the relationship between green space and income. The analysis will specifically be done in the Bay Area of California, a collection of 9 counties surrounding San Francisco Bay.
Intuitively, it makes sense to hypothesize that, in an urban setting like the Bay Area, there would be a relationship between exposure to pollution and contamination and the relative urbanization of a location. We see natural landscapes as clean and urban hardscapes as dirty, and can make assumptions as to which location is healthier to live next to. Below I quantify that relationship using the spatial distribution of parks in San Francisco, summary data from the CalEnviroScreen model, and census block, block group, and tract geometries.
In developing this analysis, 4 key datasets are leveraged: 1) US Census American Communities Survey (ACS) Income Dataset, 2) US Census Tiger Geometries, 3) CalEnviroScreen 4.0, 4) Public Lands Trust Parks Dataset.
In this analysis, geospatial work is performed on multiple scales, and scaled up or down according to the needs of a particular section of the analysis. In each of the below summaries of the individual datasets, I have described scaling needs and procedures as applicable. For spatial relationships, all distances are presented in linear feet.
Table B19001 from the US Census Bureau is used to determine income levels in each census BlockGroup. Income levels in this table are presented as discrete counts of the population within certain income bands. To account for this unique arrangement of data, we define our variable of “income” for later regressions as the percent of respondents to the survey that make more than $100k annually (USD). The threshold of $100k is chosen a) as a clean break that is well-defined by the natural arrangement of the income categories and b) because the mean annual income of the Bay Area is roughly $100k.
Originally, this analysis was intended to be performed with building footprints provided by the city of San Francisco. However, for practical purposes, census geometries were chosen instead. First, the building footprints dataset is extremely large and detailed, and very long processing times presented a limitation early in the development of this analysis, prompting the switch to a dataset with lower resolution and greater spatial extent. Second, the building footprints included residences, commercial structures, government structures, and retail structures. Given the urban setting, many (possibly most) of the structures that were residential were multi-family, which would bias the analysis against people living in apartments, duplexes, etc. Third, because CalEnviroScreen and the ACS income data are presented at the Census Tract and BlockGroup levels, respectively, the building footprints would need to be rolled up to lesser spatial extents anyway. Fourth, because of the roll-up requirement and the limited extent of the building footprints being used (San Francisco), when rolling up to Census geometries, sample size would be diminished for later regressions.
Therefore, the decision was made to focus on Census BlockGroups (for the ACS Income analysis) and Census Tracts (for the CalEnviroScreen analysis) and to expand the analysis to the entire Bay Area of California. This allows for a more generalizable analysis and a more meaningful sample set. When developed, the Blocks dataset was filtered to eliminate all Blocks with a land area value of 0.0 to eliminate the blocks in San Francisco that are all water area. Following this filter, the centroid was calculated for each Block, and this centroid was used to calculate the distance between a Block and the nearest park boundary. For the BlockGroups, the distances were averaged across the Blocks contained, and this average was assigned as the distance for that BlockGroup. For Tracts, the distances were again averaged across the BlockGroups.
CalEnviroScreen 4.0 is used as a simplified index of general environmental contaminant exposure. While the index incorporates a variety of stressors and pollutants, the summary index was chosen as the most appropriate indicator, given the wide variety of communities and potential exposures throughout the sample area. The CalEnviroScreen 4.0 data is presented at the Census BlockGroup level, and therefore the summary value is compared against the mean distance to a park from the Block centroids contained within each BlockGroup.
The CalEnviroScreen 4.0 summary score is a modeled summary of 21 indicators that are designed to account for 4 risk areas. In the risk area of Exposures, the score summarizes exposure of the population to Ozone, PM2.5, Diesel PM, Drinking Water Contaminants, Toxic Facility Releases, Traffic, Housing Lead Risk to Children, and Pesticides. The Environmental Effects risk area presents the exposure to Cleanup Sites, Solid Waste Facilities, Groundwater Contamination, Impaired Water Bodies, and Hazardous Waste. The sensitive Populations risk area assesses the prevalence of Asthma, Cardiovascular Disease, and Low Birth Weight Infants in a population. Finally, the risk area for Socioeconomic Factors incorporates rates of Education Attainment, Housing Burden, Linguistic Isolation, Poverty, and Unemployment. Together, these 21 indicators are summarized in a model according to this report, and a summary score is generated. This summary score, while limited in its utility as all indexes are, presents a useful metric for understanding general risk across various and diverse geographies, like those examined in this analysis. Moreover, the CalEnviroScreen summary score has been codified into California State policy, such as SB 535, which uses the summary score to designate disadvantaged communities. Precedent exists, therefore, for using this summary score across large geographies, and it is useful to understand these relationships using a metric that is formally adopted as a decision-making tool by the California government.
Park boundaries were downloaded and incorporated as shapefiles provided by the Trust for Public Lands ParkServe Program. No modifications were done to the park boundaries dataset (with the exception of re-projection), and all data was retained for the nine counties encompassing the Bay Area. Importantly, data were not included for the counties bordering the Bay Area, and therefore it is possible that Blocks on the outer boundary of some Bay Area counties were incorrectly attributed with a “nearest park” that was erroneously far from them if the actual nearest park is in a county that was not included. This is assumed to be a negligible portion of the analysis, given the sample size.